Workflow performance improvement using model-based scheduling over multiple clusters and clouds
نویسندگان
چکیده
In recent years, a variety of computational sites and resources have emerged, and users often have access to multiple resources that are distributed. These sites are heterogeneous in nature and performance of different tasks in a workflow varies from one site to another. Additionally, users typically have a limited resource allocation at each site capped by administrative policies. In such cases, judicious scheduling strategy is required in order to map tasks in the workflow to resources so that the workload is balanced among sites and the overhead is minimized in data transfer. Most existing systems either run the entire workflow in a single site or use näıve approaches to disThe submitted manuscript has been created by UChicago Argonne, LLC, Operator of Argonne National Laboratory (“Argonne”). Argonne, a U.S. Department of Energy Office of Science laboratory, is operated under Contract No. DE-AC02-06CH11357. The U.S. Government retains for itself, and others acting on its behalf, a paid-up nonexclusive, irrevocable worldwide license in said article to reproduce, prepare derivative works, distribute copies to the public, and perform publicly and display publicly, by or on behalf of the Government. ∗Principal corresponding author Email addresses: [email protected] (Ketan Maheshwari), [email protected] (Eun-Sung Jung), [email protected] (Jiayuan Meng), [email protected] (Vitali Morozov), [email protected] (Venkatram Vishwanath), [email protected] (Rajkumar Kettimuthu) Preprint submitted to Future Generation Computer Systems March 19, 2015 tribute the tasks across sites or leave it to the user to optimize the allocation of tasks to distributed resources. This results in a significant loss in productivity. We propose a multi-site workflow scheduling technique that uses performance models to predict the execution time on resources and dynamic probes to identify the achievable network throughput between sites. We evaluate our approach using real world applications using the Swift parallel and distributed execution framework. We use two distinct computational environments–geographically distributed multiple clusters and multiple clouds. We show that our approach improves the resource utilization and reduces execution time when compared to the default schedule.
منابع مشابه
A Clustering Approach to Scientific Workflow Scheduling on the Cloud with Deadline and Cost Constraints
One of the main features of High Throughput Computing systems is the availability of high power processing resources. Cloud Computing systems can offer these features through concepts like Pay-Per-Use and Quality of Service (QoS) over the Internet. Many applications in Cloud computing are represented by workflows. Quality of Service is one of the most important challenges in the context of sche...
متن کاملScheduling Multiple Workflow Using De-De Dodging Algorithm and PBD Algorithm in Cloud: Detailed Study
Workflow scheduling is an important part of cloud computing and based on different criteria it decides cost, execution time, and performances. A cloud workflow system is a platform service facilitating automation of distributed applications based on new cloud infrastructure. An aspect which differentiates cloud workflow system from others is market-oriented business model, an innovation which c...
متن کاملRobust and fault-tolerant scheduling for scientific workflows in cloud computing environments
CLOUD environments offer low-cost computing resources as a subscription-based service. These resources are elastically scalable and dynamically provisioned. Furthermore, new pricing models have been pioneered by cloud providers that allow users to provision resources and to use them in an efficient manner with significant cost reductions. As a result, scientific workflows are increasingly adopt...
متن کاملGreen Energy-aware task scheduling using the DVFS technique in Cloud Computing
Nowdays, energy consumption as a critical issue in distributed computing systems with high performance has become so green computing tries to energy consumption, carbon footprint and CO2 emissions in high performance computing systems (HPCs) such as clusters, Grid and Cloud that a large number of parallel. Reducing energy consumption for high end computing can bring various benefits such as red...
متن کاملSmart Workflow Scheduling using the Hybridization of Random Weight Model with Ant Colony Optimization (RWM-ACO)
The cloud based platforms are designed specifically for the provision of the high performance clusters (HPC), which is realized by using the multiple techniques all together for the realization of the distributed computing environment. The cloud platforms are designed to handle the independent queries either in the groups or individually for the minimization or optimization of the response time...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Future Generation Comp. Syst.
دوره 54 شماره
صفحات -
تاریخ انتشار 2016